Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Neural machine translation corpus expansion method based on language similarity mining
Can LI, Yating YANG, Yupeng MA, Rui DONG
Journal of Computer Applications    2021, 41 (11): 3145-3150.   DOI: 10.11772/j.issn.1001-9081.2020122039
Abstract308)   HTML8)    PDF (759KB)(118)       Save

Concerning the lack of tagged data resources in machine translation tasks of low-resource languages, a new neural machine translation corpus expansion method based on language similarity mining was proposed. Firstly, Uyghur and Kazakh were considered as similar language pairs and their corpora were mixed. Then, Byte Pair Encoding (BPE), syllable segmentation and BPE based on syllable segmentation were carried out on the mixed corpus respectively to explore the similarity between Kazakh and Uyghur deeply. Finally, the “Begin-Middle-End (BME)” sequence tagging method was introduced to tag the segmented syllables in the corpus in order to eliminate some ambiguities caused by syllable input. Experimental results on CWMT2015 Uyghur-Chinese parallel corpus and Kazakh-Chinese parallel corpus show that, compared with the ordinary models without special corpus processing and trained by BPE corpus processing training, the proposed method increases the Bilingual Evaluation Understudy (BLEU) by 9.66, 4.55 respectively for the Uyghur-Chinese translation and by 9.44, 4.36 respectively for the Kazakh-Chinese translation. The proposed scheme achieves cross-language neural machine translation from Uyghur and Kazakh to Chinese, improves the translation quality of Uyghur-Chinese and Kazakh-Chinese machine translation, and can be applied to corpus processing of Uyghur and Kazakh.

Table and Figures | Reference | Related Articles | Metrics